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SYNTHETIC INSECT1C1DAI , CRYSTAL PROTEIN GENE 



RELD OF THE INVENTION 
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This invention relates to the field of bacterial 
by recombinant technology for the purpose of 
chemical synthesis - of a modified crystal protein 
the selective expression of this synthetic 
synthetic gene Into a host microorganism, rendering 
of expression, a protein having toxicity to 
bacteria and plants to attain desired expression 



BACKGROUND OF THE INVENTION 
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n olecular biology and, in particular, to genetic engineering 
projecting plants from insect pests. Disclosed herein are the 
from Bacillus thuringiensis var. teTiebrtonis, (Btt), and 
insecticMdal gene. Ate) disclosed is the transfer of the A donqd 
me organism capable of producing/ at improve* levels 
This invention facilitates the genetic engineering of 
of novel toxins having agronomic value. 



B. thuringiensis (Bp Is unique In its ability to pj 
crystalline inclusions "which are found to be hig 
The crystal proteins of different Bt strains 
commercially as very selective biological insect 
dlpteran insects. Recently two subspecies (or 
coieopteran insects: var. tenebrtohls (Krteg et al. ( 
(Hermstadt at al. (1986) Biotechnol. 4:305*3081 



>, during the process of spoliation, proteinadeous, 
toxic to several insect pests . of agricultural importance, 
a rather narrow host range and hence' are used 
Numerous strains of Bt are toxic to topidopteran aid 
ties) of Bt have been, reported to be - 'pE^eipic-' to 
3) Z. Angew. Entomoi 88:50&*5Q8) and var. saridtego 
strains produce flat rectangular crystal mcluslons arid 
have a/ma^r aystal component of 64-68 kDa (dermstadt et aL supra; Bemhard (1988) FEMS fodrobtor. 
Lett 33:281-285). ^ 

Toxin genes from several subspecies of Bt have been cloned, and the recombinant clones were found to 
be toxic to topidopteran and dlpteran insect larvae. The two coteopteran-active toxin genes have also been 
isolated and expressed. Hermstadt et al. supra ch >nes a 3.8 kb BamH I fragment of Bt var. san dlego ONA. 
The protein expressed in E. coB was toxic to P. luteola (BM leaf beetle) and had a molecular weight of 
approximately 83 kDa. This 83kba toxin product torn the var. san dtego gene was larger than toe 64 kDa 
crystal protein Isolated from Bt var. san dlego cells, suggesting that the Bt var. san dlego crystal protein 
may be synthesized as a larger precursor moieculi > that is processed by Bt var. sandlego but not by E coD 
prior to being formed into a crystal. 

Sekar et aL (1987) Proa Nat Acad. Sci. USA 84:7038-7040; U.3. Patent Application 108,285, filed 
October 1371987 stated the crystal protein gene from Btt and determined the nucleotide sequence. This 
crystal protein gene was contained on a 5.9 kb BamHI fragment, (pNSBF544), A subclone containing the 3 
kb HindHI fragment from pNSBF544 was construed This Hind HI fragment contains an open reading frame 
(ORF) that encodes a 844-amino add polypeptke of approximately 73 kDa. Extracts of both subclones 
exhibited toxicity to larvae of Colorado potato beetle (Leptinotarsa djcemlineata, a coieopteran insect). 73- 
and 65-kDa peptides that cross-reacted with an antiserum against the crystal protein of var. tenebrionjs 
were produced on expression In E coll. SporulaBn j var. tenebrionis cells contain an Immunoreactfve 73*kDa 
peptide that corresponds to the expected product from the ORF of pNSBP544. However, isolated crystals 
primarily contain a 65-kDa component When tha crystal protein gene was shortened at the N-t?rminal 
region, the dominant protein product obtained was the 85-kDa peptide. A deletion derivative, p544Pst-Met5, 
was enzymatlcaily derived from the 5.9 kb BamH fragment upon removal of forty-six amino acid residues 
from the N-termbius. Expression of the N-temlnal deletion derivative. p544Pst-Met5. resulted in the 
production of, almost exclusively, the 85 kDa pjotein. Recently, McPherson et aJ. (1988) Biotechnology 
6:81-68 demonstrated that the Btt gene contains two functional translational InHaHbn codons in the same 
Feeding frame leading to the production of both the i full-length protein and an N-terminal truncated form. 

Chimeric toxin genes from several strains o 1 Bt have been expressed in plants. Four modified BK 
genes from var. berllner 1715. under the control of the 2 promoter of the Agrobacterium TR-DNA, were 
transferred Into tobacco plants (Vaeck et al. (1J87) Nature 328afr37). Insectlcidal levels of toxin were 
produced when truncated genes were express ec in transgenic plants. However, the steady state mRNA 
levels In the transgenic plants were so low that thoy could not be reliably detected in Northern blot analysis 
and hence were quantified using ribonuclease pn tectian experiment* Bt mRNA levels in plants producing 
the highest level of protein corresponded to «0.0( 01% of the poly(A)* mRNA. In the report by Vaeck et al. 
(1987) supra, expression of chimeric genes contai ling the entire coding sequence of Btt were compared to 
those containing truncated Bt2 genes. Additionally, some T-DNA constructs included a chimeric NPTII gene 
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as a marker selectable in plants, whereas other constructs carried translational fusions between fragments 
of Bt2 and the NPT11 gene. Insectiddal levels of toxin were produced when truncated Bt2 genes or fusion 
constructs were expressed In transgenic plants. Greenhouse grown plants produced «0.02% of the total 
soluble protein as the toxin, or Zng of toxin per g. fresh leaf tissue and, even at five-fold lower levels, 

s showed 100% mortality in six-day feeding assays. However, no significant Insectlcidal activity could be 
obtained using the Intact Bt2 coding sequence, despite the fact that the same promoter was used to direct 
its expression. Intact Bt2 protein and RNA yields In the transgenic plant leaves were 10-50 times lower 
than those for the truncated Bt2 polypeptides or fusion proteins. 

Barton et al. (1987) Plant Physlot 85.1103-1108 showed expression of a Bt protein in a system 

iq containing a"35S promoter, a viral (TMV) leader sequence, the Bt HD-1 4.5 kb gene (encoding a 845 amino 
add protein followed by two proline residues) and a nopaline synthase (nos) poiy(A)+ sequence. Under 
these conditions expression was observed for Bt mRNA at levels up to 47 pg/20ug RNA and 12 ng/mg 
plant protein. This amount of Bt protein In plant tissue produced 100% mortality In two days. This level of 
expression still represents a low level of mRNA (25 X 1<T*%) and protein (1 2 X HT*%). 

ts Various hybrid proteins consisting of N-terminal fragments of increasing length of the Bt2 protein fused 
to NPTII were produced In £ coO by Hofte et al. (1988) FEBS Lett 228384-370. Fusion proteins containing 
the first 807 amino adds of Bt2 exhibited Insect toxicity; fusion proteins not containing this minimum N- 
terminal fragment were nontoxic. Appearance of NPTII activity was not dependent upon the presence of 
insectiddal activity; however, the conformation of the Bt2 polypeptide appeared to exert an Important 

20 influence on the enzymatic activity of the fused NPTII protein. This study did suggest that the global 3-D 
structure of the Bt2 polypeptide is disturbed In truncated polypeptides. 

A number of researchers have attempted to express plant genes in yeast (Neill et al. (1987) Gene 
55*03-317; Rothstein et al. (1987) Gene 55353-356; Coraggto et al. (1988) EMBO J. 5:459365) and E. coll 
(Fuzakawa et tL (1887)"FfcBS Lett. 224:125-127; Vies et al. (198§) EMBO J. 5:2439=2444; Gatenby et at 

23 (1987) Eur.7.T3tochem. 168227-231). In the case of wheat orgliadln (Neill et aL (1987) supra) , o-amylase 
(Rothstein et al. (1987) supra) genes, and maize zeln genes (Coraggto et al. (1988) supra) in yeast low 
levels of expreision have been reported. Neill et al. have suggested that thelow levels of expression of a- 
gfladin in yeast may be due In part to codon usage bias, since crglladin codons for Phe, teu, Ser, Gly, Tyr 
and especially GIU do not correlate wed with the abundant yeast Isoacceptor tRNAs. In EL coll however, 

30 soybean glyclnln A2 (Fuzakawa et al. (1887) supra) , and wheat RuBPC SSU (Was et al. (1988) supra; 
Gatenby et al. (1987) supra ) are expressed adequately. 

Not much is known about the makeup of tRNA populations in plants. Viotti et al. (1978) Biochim. 
Blophys. Acta 517:125-132 report that maize endosperm actively synthesizing zeln, astorage protein rich in 
glutamine, leucine, and alanine, is characterized by higher levels of accepting activity for these three amino 

as acids man are maize embryo tRNAS. This may indicate that the tRNA population of specific plant tissues 
may be adapted for optimum translation of highly expressed proteins such as zein. To our knowledge, no 
one has experimentally altered codon bias In highly expressed plant genes to determine possible effects of 
tfte protein translation In plants to check the effects on the level of expression. 

SUMMARY OF THE INVENTION 

It is the overall object of the present Invention to provide a means for plant protection against insect 
damage. The invention disclosed herein comprises a chemically synthesized gene encoding an insectiddal 

45 protein which Is functionally equivalent to a native Insectiddal protein of Bt This synthetic gene Is designed 
to be expressed In plants at a level higher than a native Bt gene. It Is preferred that the synthetic gene be 
designed to be highly expressed In plants as defined herein. Preferably, the synthetic gene is at least 
approximately 85% homologous to an insectiddal protein gene of Bt 

It Is a particular object of this Invention to provide a synthetic structural gene coding for an Insectiddal 

so protein from Btt having, for example, the nucleotide sequences presented in Rgure 1 and spanning 
nucleotides 1 through 1793 or spanning nucleotide 1 through 1833 with functional equivalence. 

In designing synthetic Btt genes of this invention for enhanced expression in plants, the 0NA sequence 
of the native Btt structural gene Is modified In order to contain codons preferred by highly expressed plant . 
genes, to attain an A+T content In nucleotide base composition substantially that found in plants, and also 

bb preferably to form a plant initiation sequence, and to eliminate sequences that cause destablllzation, 
Inappropriate polyadenylation, degradation and termination of RNA and to avoid sequences that constitute 
secondary structure hairpins and RNA splice sites. In the synthetic genes, codons used to specify a given 
amino add are selected with regard to the distribution frequency of codon usage employed In highly 
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expressed plant genes to specify that amino a:id. As Is appreciated by those skilled in the art the 
distribution frequency of codon usage utilized in the synthetic gene Is a determinant of the level of 
expression. Hence, the synthetic gene is designed such that its distribution frequency of codon usage 
deviates, preferably, no more than 25% from that of highly expressed plant genes and, more preferably, no 

s more than about 10%. In addition, consideration is given to the percentage G+C content of the degenerate 
third base (monocotyledons appear to favor G+O in this position, whereas dicotyledons do not). It Is also 
recognized that the XCG nucleotide is the leas preferred codon In dicots whereas the XTA codon is 
avoided in both monocots and dicots. The synthet c genes of this Invention also preferably have CG and t A 
doublet avoidance Indices as defined in the Detail ad Description closely approximating those of the chosen 

io host plant More preferably these indices deviate from that of the host by no more than about 10-15%. 

Assembly of the Bt gene of this invention Is performed using standard technology known to the art The 
Btt structural gene ddSgned for enhanced expression in plants of the specific embodiment is enzymatically 
assembled within a DMA vector from chemically synthesized oligonucleotide duplex segments. The 
synthetic Bt gene is then Introduced Into a plant lost cell and expressed by means known to the, art The 

is insectiddafproteln produced upon expression of t te synthetic Bt gene in plants Is functionally equivalent to 
a native Bt crystal protein in having toxicity to the same Insects. 



BRIEF DESCRIPTION OF THE FIGURES 



Figure 1 presents the nucleotide sequence 
sequence as found in p544Pst*Met5 is shown 
synthetic sequence with alanine replacing 
598 followed by. the addition of 13-amlno acids at 

Figure 2 represents a simplified scheme 
A through M represent oligonucleotide pieces 
unique splice sites to allow specific enzymatic 

Figure 3 is a schematic diagram showing 
don of a synthetic Btt gene. Each segment (A 
annealed and Dgated to form the desired DNA segjment 



for the synthetic Btt gene. Where different the native 
above. Changes in amino acids (underlined) occur in the 
threonine at residue 2 and leucine replacing the stop at residue 
he C- terminus. 

usod In the construction of the synthetic Btt gene. Segments 
anrealed and ligated together to form DNA duplexes having 
ass *mbty of the DNA segments to give the desired gene, 
the assembly of oligonucleotide segments in the construe- 
through M) is built from oligonucleotides of different sizes, 



DETAILED DESCRIPTION OF THE INVENTION 

The following definitions are provided in ordei| to provide clarity as to the Intent or scope of their usage 
in the Specification and claims. 

Expression refers to the transcription and translation of a structural gene to yield the encoded protein. 
The synthetic Bt genes of the present invention ire designed to be expressed at a higher level in plants 
than the corresponding native Bt genes. As will t e appreciated by those skilled in the art, structural gene 
expression levels are affected by the regulatory DNA sequences (promoter, polyadenylation sites, enhan- 
cers, etc.) employed and by the host cell In vrhlch the structural gene Is expressed. Comparisons of 
synthetic Bt gene expression and native Bt gene e xpression must be made employing analogous regulatory 
sequences and In the same host cell. It will als) be apparent that analogous means of assessing gene 
expression must be employed in such comparisons. 

Promoter refers to the nucleotide sequences it the 5' end of a structural gene which direct the initiation 
of transcription. Promoter sequences are necessary, but not always sufficient to drive the expression of a 
downstream gene. In prokaryotes. the promoter drives transcription by providing binding sites to RNA 
polymerases and other Initiation and activation factors. Usually promoters drive transcription preferentially In 
the downstream direction, although promotional activity can be demonstrated (at a reduced level of 
expression) when the gene is placed upstream ol the promoter. The level of transcription Is regulated by 
promoter sequences. Thus, in the construction of heterologous promoter/structural gene combinations, the 
structural gene is placed under the regulatory cortrol of a promoter such that the expression of the gene Is 
controlled by promoter sequences. The promoter s positioned preferentially upstream to the structural gene 
and at a distance from the transcription start site hat approximates the distance between the promoter and 
the gene it controls in its natural setting. As is mown In the art some variation In this distance can be 
tolerated without loss of promoter function. 

A gene refers to the entire DNA portion involved in the synthesis of a protein. A gene embodies the 
structural or coding portion which begins at the 5 end from the translational start codon (usually ATG) and 
extends to the stop (TAG, TGA or TAA) codon li the 3' end. It also contains a promoter region, usually 
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located B or upstream to the structural gene, which Initiates and regulates the expression of a structural 
gene. Also included in a gene are the 3' end and poly (A) + addition sequences. 

Structural gene Is that portion of a gene comprising a DNA segment encoding a protein, polypeptide or 
a portion thereoTand excluding the 5 sequence which drives the Initiation of transcription. The structural 

5 gene may be one which Is normally found In the cell or one which Is not normally found In the cellular 
location wherein it is introduced, In which case St is termed a heterologous gene. A heterologous gene may 
be derived in whole or In part from any source know to the art including a bacterial genome or eplsome, 
eukaryotic, nuclear or plasmid DNA, cDNA, viral DNA or chemically synthesized DNA. A structural gene 
may contain one or more modifications in either the coding or the untranslated regions which could affect 

w the biological activity or the chemical structure of the expression product, the rate of expression or the 
manner of expression control/ Such modifications include, but are hot limited to, mutations, insertions, 
deletions and substitutions of one or more nucleotides. The structural gene may constitute an uninterrupted 
coding sequence or it may include one or more Introns, bounded by the appropriate splice junctions. The 
structural gene may be a composite of segments derived from a plurality of sources, naturally occurring or 

is synthetic. The structural gene may also encode a fusion protein. 

Synthetic gene refers to a DNA sequence of a structural gene that is chemically synthesized in its 
entirety or for the greater part of me coding region. As exemplified herein, oligonucleotide building blocks 
are synthesized using procedures known to those skilled In the art and are flgated and annealed to form 
gene segments which are then enzymatically assembled to construct the entire gene. As Is recognized by 

20 those skilled in the art functionally and structurally equivalent genes to the synthetic genes described 
herein may be prepared by stte-specfflc mutagenesis or other related methods used In the art 

Transforming refers to stably introducing a DNA segment carrying a functional gene Into an organism 
that did not previously contain that gene. 

Plant tissue Includes differentiated and undifferentiated tissues of plants, including but not limited to, 

25 roots, shoots, leaves, pollen, seeds, tumor tissue and various forms of cells in culture, such as single cells, 
protoplasts, embryos and callus tissue. The plant tissue may be In planta or In organ, tissue or cell culture. 
Plant celt as used herein Includes plant cells in planta and plant cells and protoplasts in culture. 
Homology refers to identity or near identity of nucleotide or amino acid sequences. As is understood In 
the art nucleotide mismatches can occur at the third or wobble base In the codon without causing amino 

so acid substitutions in the final polypeptide sequence. Also, minor nucleotide modifications (e*g., substitutions, 
Insertions or deletions) In certain regions of the gene sequence can be tolerated and considered 
Insignificant whenever such modifications result In changes in amino add sequence that do not alter 
functionality of the final product It has been shown that chemically synthesized copies of whole, or parts of, 
gene sequences can replace the corresponding regions in the natural gene without loss of gene function. 

35 Homotogs of specific DNA sequences may be identified by those skilled In the art using the test of cross- 
hybridization of nucleic adds under conditions of stringency as Is well understood in the art (as described In 
Hames and Hlggens (eds.) (1885) Nucleic Acid Hybridization, IRL Press, Oxford, UK). Extent of homology is 
often measured in terms of percentage of Identity between the sequences compared. 

Functionally equivalent refers to identity or near identity of function. A synthetic gene product which is 

40 toxic to at least one of the same Insect spedes as a natural Bt protein is considered functionally equivalent 
thereto. As exemplified herein, both natural and synthetic Btt genes encode 65 kDa, insectiddal proteins 
having essentially identical amino add sequences and having toxicity to coleopteran insects. The synthetic 
Bt genes of the present Invention are not considered to be functionally equivalent to native Bt genes, since 
they are expressible at a higher level in plants than native Bt genes. 

45 Frequency of preferred codon usage refers to the preference exhibited by a specific host ceil In usage 
of nucleotide colons to spedfy a given amino acid. To determine the frequency of usage of a particular 
codon In a gene, the number of occurrences of that codon in the gene is divided by the total number of 
occurrences of all codons specifying the same amino add in the gene. Table 1, for example, gives the 
frequency of codon usage for Bt genes, which was obtained by analysis of four Bt genes whose sequences 

so are pubfldy available. Similarly, the frequency of preferred codon usage exhibited by a host ceil can be 
calculated by averaging frequency of preferred codon usage in a large number of genes expressed by the 
host cell. It Is preferable that this analysis ba limited to genes that are highly expressed by the host cell. 

Table 1, for example, gives the frequency of codon usage by highly expressed genes exhibited by . 
dicotyledonous plants, and monocotyledonous plants. The dicot codon usage was calculated using 154 

58 highly expressed coding sequences obtained from Qenbank which are listed in Table 1 Monocot codon 
usage was calculated using 53 monocot nuclear gene coding sequences obtained from Qenbank and listed 
In Table 1, located In Example 1. 

When synthesizing a gene for improved expression In a host ceil It Is desirable to design the gene such 
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that its frequency of codon usage approaches the 
The percent deviation of the frequency of 
employed by a host cell is calculated first by detei 
a single codon from that of the host cell foil 
defined herein this calculation Includes unique 
codon usage of the synthetic Btt gene, whose 
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jency of preferred codon usage of the host celL 
preferred codon usage for a synthetic gene from that 
lining the percent deviation of the frequency of usage of 
by obtaining the average deviation over all codons. As 
(La. ATQ and TGG). The frequency of preferred 
juence Is given In Figure 1. is given In Table 1. The 



frequency of preferred usage of the codon 'GTA 1 Iter valine in the synthetic gene (0.10) deviates from that 
preferred by dicots (0.12) by 0.02/0.12 = 0.107 or 18.7%. The average deviation over all amino add 
codons of the Btt synthetic gene codon usage fron that of dlcot plants is 7.8%. In general terms the overall 
average deviation of the codon usage or-«-synlh^^gem-finni that of a host cell Is calculated using the 
equation 



30 



35 



80 



99 



n=a-z 



Derived from is used to mean taken, obtained 
(chemical and/or biological). A derivative may 
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Z 



where Xa ■ frequency of usage for codon n In ttje host cell; Y 0 * frequency of usage for codon n in the 

codon that specifies an amino acid, the total number of 
Is 61. The overall deviation of the frequency of codon 
usage tor all amino acids should preferably be le^s than about 25%, and more preferably less thai! about 
10%. 

received, traced, replicated or descended from a source 
be produced by chemical or biological manipulation 
(including but not limited to substitution, addffiojf), Insertion, deletion, extraction, Isolation, mutation and 
replication) of the original source. 

Chemically synthesized, as related to a sequence of DNA, means that the component nucleotides were 
assembled in vitro. Manual chemical synthesis of DNA may be accomplished using well established 
procedures {SaSffiers, M. (1983) in Methodology] of DNA and RNA Sequencing. Weissman (ed), Praeger 
Publishers, New York, Chapter 1), or automated "chemical synthesis can be performed using one of a 
number of commercially available machines. 

The term, designed to be highly expressed as used herein refers to a level of expression of a designed 
gene wherein the amount of its specific mRNA tra iscrfpts produced Is sufficient to be quantified iri Northern 
blots and, thus, represents a level of specific mF NA expressed corresponding to greater than or equal to 
approximately 0.001% of the poly (A) + mRNA. Tc date, natural Bt genes are transcribed at a level wherein 
the amount of specific mRNA produced Is insufficient to be estimated using the Northern blot technique. 
However, in the present Invention, transcription of a synthetic Bt gene designed to be highly expressed not 
only allows quantification of the specific mRNA transcripts produced but also results in enhanced 
expression of the translation product which is measured in Insecticidal bloassays. 

Crystal protein or insecticidal crystal pr otei n wr crystal toxin refers to the major protein component of 
the parasporal crystals formed In strains of Bt Tils protein component exhibits selective pathogenicity to 
different species of insects. The molecular size of the major protein isolated from parasporal crystals varies 
depending on the strain of Bt from which it is derived. Crystal proteins having molecular weights of 
approximately 132. 65, and 2flTcDa have been re|«rted. It has been shown that the approximately 132 kDa 
protein is a protoxin that is cleaved to form an approximately 65 kDa toxin. 

The crystal protein gene refers to the DNA sequence encoding the Insecticidal crystal protein in either 
fun length protoxin or tain form, depending on the strain of Bt from which the gene is derived. 

The authors of this invention observed that ejxpressionTn plants of Bt crystal protein mRNA occurs at 
levels that are not routinely detectable in Northern blots and that low levels of Bt crystal protein expression 
correspond to this low level of mRNA expression. It Is preferred for exploitation of these genes as potential 
biocontrol methods that the level of expression of Bt genes In plant cells be improved and that the stability 
of Bt mRNA in plants be optimized. This will allou greater levels of Bt mRNA to accumulate and will result 
in an Increase in the amount of Insecticidal protein in plant tissual This is essential for the control of 
insects that are relatively resistant to Bt protein. 

Thus, this invention Is based on the reccgrftion that expression levels of desired, recombinant 
insecticidal protein In transgenic plants can be improved via increased expression of stabilized mRNA 
transcripts; and that, conversely, detection of the » stabilized RNA transcripts may be utilized to measure 
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expression of translatlonal product (protein). This invention provides a means of resolving the problem of 
low expression of insecticldal protein RNA In plants and. therefore, of low protein expression through the 
use of an improved, synthetic gene specifying an Insecticldal crystal protein from Bt 

Attempts to improve the levels of expression of Bt genes In plants have centered on comparative 

s studies evaluating parameters such as gene type, geneTength, choice of promoters, addition of plant viral 
untranslated RNA leader, addition of tntron sequence and modification of nucleotides surrounding the 
Initiation ATQ codon. To date, changes In these parameters have not led to significant enhancement of Bt 
protein expression in plants- Applicants find that, surprisingly, to express Bt proteins at the desired level in 
plants, modifications In the coding region of the gene were effective. Structural-function relationships can be 

10 studied using site-specific mutagenesis by replacement of restriction fragments with synthetic DNA 
duplexes containing the desired nucleotide changes (Lo et at. (1984) Proa Natl Acad. ScL 81:2285-2289). 
However, recent advances in recombinant DNA technology'now make It feasible to chemically synthesize 
an entire gene designed specifically for a desired function. Thus, the Btt coding region was chemically 
synthesized, modified In such a way as to improve Its expression In plants. Also, gene synthesis provides 

ib the opportunity to design the gene so as to facilitate its subsequent mutagenesis by incorporating a number 
of appropriately positioned restriction endomiclease sites into the gene. 

The present Invention provides a synthetic Bt gene for a crystal protein toxic to an insect As 
exemplified herein, tills protein Is toxic to coleopteran Insects. To the end of Improving expression of this 
insecticldal protein In plants, this Invention provides a DNA segment homologous to a Btt structural gene 

a) and, as exemplified herein, having approximately 85% homology to the Btt structural gene in p544Pst*Met5. 
In this embodiment the structural gene encoding a Btt Insecticldal protein is obtained through chemical 
synthesis of the coding region. A chemically synthesized gene is used in this embodiment because It best 
allows tor easy and efficacious accommodation of modifications in nucleotide sequences required to 
achieve improved levels of cross-expression. 

25 Today, In general, chemical synthesis Is a preferred method to obtain a desired modified gene. 

However, to da terwplant protein gene has been chemically synthesized nor has any synthetic gene for a 

bacterial protein been expressed in plants. In this Invention, the approach adopted for synthesizing the gene 
consists of designing an Improved nucleotide sequence for the coding region and {assembling the gene 
from chemically synthesized oligonucleotide segments, in designing the gene, the coding region of the 

so naturally-occurring gene, preferably from the Btt subclone, p544Pst*Met5, encocflng a 65 kDa polypeptide 
having coleoperan toxicity, is scanned for possible modifications which would result In Improved expression 
of the synthetic gene in plants. For example, to optimize the efficiency of translation, codons preferred in 
highly expressed proteins of the host cell are utilized. 

Bias In codon choice within genes In a single species appears related to the level of expression of the 

35 protein encoded by that gene. Codon. £ias is most extreme In highly expressed proteins of E. coll and 
yeast In these organisms, a strong positive correlation has been reported between the abundance of an 
isoaccepting tRNA species and the favored synonymous codon. In one group of highly expressed proteins 
in yeast over 86% of the amino acids are encoded by only 25 of the 61 available codons (Bennetzen and 
Hall (1882) J. Biol. Chem. 2573026-3031). 

40 These 25 codons are preferred In all sequenced yeast genes, but the degree of preference varies with 
the level of expression of the genes. Recently, Hoekema and colleagues (1887) Mol. Cell. Biol. 7:^14-2924 
reported that replacement of these 25 preferred codons by minor codons In the 5 end of the highly 
expressed yeast gene PGW results In a decreased level of both protein and mRNA. They concluded that 
biased codon choice in highly expressed genes enhances translation and is required for maintaining mRNA 

45 stability In yeast Without doubt the degree of codon bias Is an Important factor to consider when 
engineering high expression of heterologous genes in yeast and other systems. 

Experimental evidence obtained from point mutations and deletion analysis has indicated that in 
eukaryotic genes specific sequences are associated with posttranscriptional processing, RNA destabilize- 
tion, translatlonal termination, intron splicing and the like. These are preferably employed in the synthetic 

so genes of this invention. In designing a bacterial gene for expression In plants, sequences which Interfere 
with the efficacy of gene expression are eliminated. 

in designing a synthetic gene, modifications In nucleotide sequence of the coding region are made to 
modify the A+T content In DNA base composition of the synthetic gene to reflect that normally found In 
genes tor highly expressed proteins native to the host cell Preferably the A+T content of the synthetic 

6S gene is substantially equal to that of said genes for highly expressed proteins. In genes encoding highly 
expressed plant proteins, the A+T content Is approximately 55%. It Is preferred that the synthetic gene 
have an A+T content near this value, and not sufficiently high as to cause destabillzation of RNA and, 
therefore, lower the protein expression levels. More preferably, the A+T content Is no more than about 60% 
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and most preferably is about 55%. Also, for ultimate expression In plants, jhe synthetic gene nucleotide 
sequence Is preferably modified to form a plant nftiation sequence at the 5 end of the coding region. In 
addition, particular attention Is preferably given to assure that unique restriction sites are placed In strategic 
positions to allow efficient assembly of oligonucleotide segments during construction of the synthetic gene 

5 and to facilitate subsequent nucleotide modification. As a result of these modifications in coding region of 
the native Bt gene, the preferred synthetic gene is expressed in plants a! an enhanced level when 
compared to that observed with natural Bt structur al genes. 

In specific enibotfiments, the synthetic Bt gene of this invention encodes a Btt protein toxic to 
coleopteran Insects. Preferably, the toxic polypej tide is about 598 amino adds in length, Is at least 75% 

10 homologous to a Btt polypeptide, and, as exempli fled herein, is essentially identical to the pr6teln encoded 
by p544Pst*Met5, except for replacement of three nine by alanine at residue 2. This amino acid substitution 
results as a consequence of the necessity to htroduce a guanine base at position +4 in the coding 
sequence. 

In designing the synthetic gene of this invertton, the coding region from the Btt subclone, p544Pst- 

is Met5, encoding a 65 kDa polypeptide having cdeopteran toxicity. Is scanned for possible modifications 
which would result In improved expression of the synthetic gene in plants. For example, In preferred 
embodiments, the synthetic insectiddal protein is strongly expressed in dicot plants, &g^ tobacco, tomato, 
cotton, etc., and hence, a synthetic gene under hese conditions is designed to Incorporate to advantage 
codons used preferentially by highly expressed d cot proteins* In embodiments where enhanced expression 

20 of insectiddal protein Is desired In a monocot, codons preferred by highly expressed monocot proteins 
(given in Table 1) are employed In designing the 2 ynthetic gens. 

In general, genes within a taxonomic group exhibit similarities In codon choice, regardless of the 
function of these genes. Thus an estimate of the 3verail use of the genetic code by a taxonomic group can 
be obtained by summing codon frequendes of ail Its sequenced genes. This spades-specific codon choice 

23 is reported in this invention from analysis of 208 plant genes* Both monocot and dicot plants are analyzed 
individually to determine whether these broader fc xonomlc groups are characterized by different patterns of 
synonymous codon preference. The 208 plant jenes Induded In the codon analysis code tor proteins 
having a vrfde range of functions and they represent 8 monocot and 38 dicot species. These proteins are 
present in different plant tissues at varying levels < rf expression. 

so In this Invention It Is shown that the relative use of synonymous codons differs between the monocots 
and the dicots. In general, the most important fat tor in discriminating between monocot ami dicot patterns 
of codon usage is the percentage G+C content <rf the degenerate third base. In monocots, 18 of 18 amino 
adds favor Q+C In this position, while dicots only favor G+C In 7 of 18 amino acids. 

The Q ending codons for Thr, Pro, Ala and Ser are avoided In both monocots and dicots because they 

as contain C in codon position II. The CQ dinucleoide is strongly avoided in plants (Boudraa (1887) Genet 
Sel. Evol. 18:148-154) and other eukaryotes (Grantham et a!. (1886) Bull. Inst Pasteur fflfl&-148), possibly 
due to regulation involving methylation. In dicots, XCG Isalways the least favored codon, while in monocots 
this is not the case. The doublet TA Is also avoided in codon positions II and III in most eukaryotes, and this 
is true of both monocots and dicots. 

40 Grantham and colleagues (1888) Oxford Siirveys in Evol. Biol 3:48-81 have developed two codon 
choice Indices to quantify CG and TA doublet avtfdance In codon positions II and I1L XCG/XCC Is the ratio 
of codons having C as base II of G-ending to C-e iding triplets, while XTA/XTT is the ratio of A-ending to T- 
ending triplets with T as the second base. These indices have been calculated for the plant data In this 
paper (Table 2) and support the condusion that monocot and dicot species differ in their use of these 

4S dinudeotides. 
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Table 2 



Avoidance of CQ and TA doublets in codona position ll-lll. 



XCQ/XCC and XTA/XAA values are multiplied by 100. 



Group 


Plants 


Dicots 


Monocots 


Maize 


Soybean 


RuBPC 
3SU 


CAB 


XCG/XCC 


40 


30 


61 


67 . 


37 


18 


22 


XTA/XTT 


37 


35 


47 


43 


41 


9 


13 



RuBPC SSU = ribulose 1,5 blsphosphate small subunit 
CAB = chlorophyll a/b bimUng protein 



1B 

Additionally, tor two species, soybean and maize, spedes-speciflc codon usage profiles were calculated 
(not shown). The maize codon usage pattern resembles that of monocots In general, since these sequences 
represent over half of the monocot sequences available. The codon profile of the maize subsample Is even 

so more strikingly biased In its preference for G+C in cotton position III. On the other hand, the soybean 
codon usage pattern Is almost Identical to the general dlcot pattern, even though It represents a much 
smaller portion of the entire (Scot sample. 

In order to determine whether the coding strategy of highly expressed genes such as the ribulose 1,5 
blsphosphate small subunit (RuBPC SSU) and chlorophyll aft) binding protein (CAB) Is more biased than 

25 that of plant genes In general, codon usage profiles for subsets of these genes (19 and 17 sequences, 
respectively) were calculated (not shown). THe RuBPC SSU and CAB pooled samples are characterized by 
stronger avoidance of the codons XCQ and XTA than In the larger monocot and dlcot samples (Table 2). 
Although most of the genes in these subsampies are dlcot in origin (17AI9 and 15/17), their codon profile 
resembles that of the monocots In that G+C Is utilized in the degenerate base III. 

ao The use ol pooled data for highly expressed genes may obscure identification of species-specific 
patterns in codon choice. Therefore, the codon choices of individual genes for RuBPC SSU and CAB were 
tabulated. The preferred codons of the maize and wheat genes for RuBPC SSU and CAB are more 
restricted in general than are those of the dlcot species. This Is in agreement with Matsuoka et aL (1987) J. 
Blochem. 102.-87fr878) who noted the extreme codon Was of the maize RuBPC SSU gene as well as two 

as other hfghlyexpressed genes in maize leaves, CAB and phosphoenolpyruyate carboxylase. These genes 

• almost completely avoid the use of A+T In codon position III, although this codon bias was not as 
pronounced In non-leaf proteins such as alcohol dehydrogenase, zein 22 kDa sub-unit sucrose synthetase 
and ATP/ADP translocator. Since the wheat SSU and CAB genes have a similar pattern of codon 
preference, this may reflect a common monocot pattern for these highly expressed genes in leaves. The 

40 CAB gene for Lemna and the RuBPC SSU genes for Chlamdomonas share a similar extreme preference for 
Q+C In codon position III. In dlcot CAB genes, however, A+T degenerate bases are preferred by some 
synonymous codons (e.g., QCT for Ala, CTT for Leu, GQA and GOT for Gly). In general, the G+C 
preference Is less pronounced for both RuBPC SSU and CAB genes in dicots than in monocots. 

In designing a synthetic gene for expression In plants, attempts are also made to eliminate sequences 

45 which interfere with the efficacy of gene expression. Sequences such as the plant polyadenylation signals, 
e.g n AATAAA, polymerase II termination sequence, ag., CAN^GTNNAA, UCUUCGG hairpins and plant 
consensus splice sites are highlighted and, if present in the native Btt coding sequence, are modified so as 
to eliminate potentially deleterious sequences. 

Modifications in nucleotide sequence of the Btt coding region are also preferably made to reduce the 

so A+T content In ONA base composition. The Btt coding region has an A+T content of 64%, rhich Is about 
10% higher than that found In a typical plant coding region. Since A+T-rich regions typify plant Intergenlc 
regions and plant regulatory regions, it is deemed prudent to reduce the A+T content The synthetic Btt 
gene is designed to have an A+T content of 55%, In keeping with values usually found In plants. 

Also, a single modification (to introduce guanine In lieu of adenine) at the fourth nucleotide position in 

as the Btt coding sequence is made in the preferred embodiment to form a sequence consonant with that 
believed to function as a plant Initiation sequence (Taylor et at (1887) Mol. Gen. Genet 210572-577) in 
optimization of expression. In addition, In exemplifying this" Invention thirty-nine nucleotides (thirteen 
codons) are aided to the coding region of the synthetic gene In an attempt to stabilize primary transcripts. 
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However, it appears that equally stable transcripts are obtained in the absence of this extension polypeptide 
containing thirty-nine nucleotides. 

Not ail of the above-mentioned modifications of the natural Bt gene must be made In constructing a 
synthetic Bt gene In order to obtain enhancec expression. For example, a synthetic gene may be 
synthesizoTfor other purposes in addition to that of achieving enhanced levels of expression. Under these 
conditions, the original sequence of the natural Bt gene may be preserved within a region of DNA 
corresponding to one or more, but not all. segments used to construct the synthetic gene. Depending on 
the desired purpose of the gene, modification may encompass substitution of one or more, but not all, of 
the oligonucleotide segments used to construct tha synthetic gene by a corresponding region of natural Bt 
sequence. 

As is known to those skilled In the art of synthesizing genes (MandecW et al. (1985) Proa NatL Acad. 
Sd. 823543*547; FeretU et al. (188S) Proa itatL Acad. Sci. 83:59fr603), the DNA sequence to be 
synthesized Is divided into "Segment lengths which can be synthesized conveniently and without undue 
complication. As exemplified herein, in preparing [to synthesize the Btt gene, the coding region Is divided 
into thirteen segments (A - M). Each segment has unique restriction sequences at the cohesive ends. 
Segment A. for example. Is 228 base pairs In length and is constructed from six oligonucleotide sections, 
each containing approximately 75 bases. Singte-sjranded oligonucleotides are annealed and ligated to form 
DNA segments. The length of the protruding cohesive ends in complementary oligonucleotide segments is 
four to five residues. In the strategy evolved for gene synthesis, the sites designed for the joining of 
oligonucleotide pieces and DNA segments are different from the restriction sites created In the gene. 

In the specific embodiment each DNA segment Is cloned into a plC-20 vector for amplification of the 
DNA. The nucleotide sequence of each fragment is determined at this stage by the dideoxy method using 
the recombinant phage DNA as templates and selected synthetic oligonucleotides as primers. 

" As exemplified herein ami illustrated schema* cafly In Figures 3 and 4, each segment Individually (e.g., 
segment M) is excised at the flanking restriction sites from its cloning vector and spliced into the vector 
containing segment A. Most often, segments are added as a paired segment instead of as a single segment 
to increase efficiency. Thus, the entire gene is constructed In the original piasmid harboring segment A. The 
nucleotide sequence of the entire gene is deten lined and found to correspond exactly to that shown in 
Rgure 1. 

in preferred embodiments the synthetic Btt jene Is expressed in plants at an enhanced level when 
compared to that observed with natural Btt structural genes. To that end, the synthetic structural gene Is 
combined with a promoter functional in plants, tire structural gene and the promoter region being in such 
position and orientation with respect to each otfojr that the structural gene can be expressed in a cell In 
which the promoter region is active, thereby tam ing a functional gene. The promoter regions include, but 
are not limited to, bacterial and plant promoter regions. To express the promoter region/structural gene 
combination, the DNA segment carrying the comMnation is contained by a cell. Combinations which Include 
plant promoter regions are contained by plant ceils, which, In turn, may be contained by plants or seeds. 
Combinations which include bacterial promoter regions are contained by bacteria, e.g.. Bt orE. coll. Those 
in the art will recognize that expression in types of micro-organisms other than bacteria may in some 
circumstances be desirable and, given the presem disclosure, feasible without undue experimentation. 

The recombinant DNA molecule carrying a synthetic structural gene under promoter control can be 
Introduced into plant tissue by any means known to those skilled in the art The technique used for a given 
plant species or specific type of (riant tissue depends on the known successful techniques. As novel means 
are developed for the stable insertion of foreign genes into plant cells and for manipulating the modified 
cells, skilled artisans will be able to select fron known means to achieve a desired result Means for 
Introducing recombinant DNA into plant tissue include, but are not limited to, direct DNA uptake 
(PaszkowsW, J. et al. (1984) EMBO J. 32717), cledroporation (Fromm. M. et al. (1985) Proc. Natl. Acad. 
Sci. USA 825824)7 microinjection (Crossway. A. et al. (1988) Mol. Gea~Genet 202:179), or T-DNA 
mediated transfer from Agrobacterlum tumefacien s tofoe plant tissue. There appears tow no fundamental 
limitation of T-DNA transformation to the natural Iwst range of Agrobacterlum. Successful T-DNA-medlated 
transformation of monocote (Hooykaas-Van Stojteren. Q. et al. (1984) Nature 3n:763), gymnosperm 
(Dandekar, A. et al. (1987) Biotechnology 5:587) ind algae (Ausich. ft, EPO application 108.50)) has been 
reported. Representative T-DNA vector systems are described In the following references: An, Q. et al. 
(1985) EMBO J. 4577; Herrera-Estrella. L et al (1983) Nature 303:209; Herrera-Estrella, L et al. (1983) 
EMBO J. 2587; Herrera-Estrella, L et al. JlMS). In Plant Genetic Engineering. New York: Cambridge 
University Press, p. 63. Once introduced into the plant tissue, the expression of the structural gene may be 
assayed by any means known to the art, and agression may be measured as mRNA transcribed or as 
protein synthesized. Techniques are known for thi > in vitro culture of plant tissue, and In a number of cases. 
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for regeneration Into whole plants. Procedures for transferring the Introduced expression complex to 
commercially useful cultivars are known to those skilled In the art 

In one of its preferred embodiments the invention disclosed herein comprises expression In plant cells 
of a synthetic insecticidal structural gene under control of a plant expressible promoter, that Is to say, by 

s Inserting the Insecticide structural gene into T-DNA under control of a plant expressible promoter and 
Introducing the T-DNA containing the Insert Into a plant cell using known means. Once plant cells 
expressing a synthetic tnsecticldal structural gene under control of a plant expressible promoter are 
obtained, plant tissues and whole plants can be regenerated therefrom using methods and techniques well* 
(mown in the art The regenerated plants are then reproduced by conventional means and the introduced 

10 genes can be transferred to other strains and cuWvars by conventional plant breeding techniques. 

The Introduction and expression of the synthetic structural gene for an Insecticidal protein can to used 
to protect a crop from infestation with common Insect pests- Other uses of the invention, exploiting the 
properties of other insecticide structural genes Introduced Into other plait species will be readily apparent 
to those skilled in the art The Invention In principle applies to introduction of any synthetic Insecticide 

is structural gene Into any plant species Into which foreign DNA (In the preferred embodiment T-DNA) can be 
Introduced and In which said DNA can remain stably replicated. In general, these taxa presently include, but 
are not limited to, gymnosperms and dicotyledonous plants, such as sunflower (family Compostteae), 
tobacco (family Sofanaceae), alfalfa, soybeans and other legumes (family Legumlnoseaa), cotton (family 
Malvaceae), and most vegetables, as well as monocotyiedonous plants. A plant containing In its tissues 

20 Increased levels of insecticidal protein will control less susceptible types of insect, thus providing advantage 
over present insecticidal uses of Bt. By incorporation of the Insecticidal protein into the tissues of a plant, 
the present Invention additionally provides advantage over present uses of Insecticides by eliminating 
Instances of nonuniform application and the costs of buying and applying Insecticidal preparations to a field. 
Abo, the present Invention eliminates the 'need tor careful timing of application of such preparations since 

28 small larvae are most sensitive to Insecticidal protein and the protein is always present, minimizing crop 
damage that would otherwise result from preappllcation larval foraging. 

This Invention combines the specific teachings of the present disclosure with a variety of techniques 
and expedients known in the art The choice of expedients depends on variables such as the choice of 
Insecticidal protein from a Bt strain, the extent of modification in preferred codon usage, manipulation of 

so sequences considered to be destabilizing to RNA or sequences prematurely terminating transcription, 
insertions of restriction sites within the design of the synthetic gene to allow future nucleotide modifications, 
addition of Introns or enhancer sequences to the 5 and/or 3 ends of the synthetic structural gene, the 
promoter region, the host In which a promoter region/structural gene combination is expressed, and the Bke. 
As novel Insecticidal proteins and toxic polypeptides are discovered, and as sequences responsible for 

35 enhanced cross-expression (expression of a foreign structural gene in a given host) are elucidated, those of 
ordinary skill will be able to select among those elements to produce "Improved 0 synthetic genes for 
desired proteins having agronomic value. The fundamental aspect of the present Invention Is the ability to 
synthesize a novel gene coding for an Insecticidal protein, designed so that the protein will be expressed at 
an enhanced level in plants, yet so that It will retain its Inherent property of Insect toxicity and retain or 

40 Increase Its specific Insecticidal activity. 



EXAMPLES 

45 The following Examples are presented as illustrations of embodiments of the present invention. They do 
not limit the scope of this invention, which Is determined by the claims. 

The following strains were deposited with the Patent Culture Collection, Northern Regional Research 
Center, 1815 N. University Street Peoria, Illinois 61604. 



Strain 


Deposited on 


Accession 0 


EL con MC1O01 (p544-HlndlH) 
Et coll MC1061 (p544Pat-MetS) 


8 October 1987 
8 October 1987 


NRRL B-18257 
NRRLB-18258 



58 

The deposited strains are provided for the convenience of those In the art, and are not necessary to 
practice the present Invention, which may be practiced with the present disclosure in combination with 
publicly available protocols, information, and materials. E coD MC1061, a good host for plasmld transform a- 
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b'ons. was disclosed by Casadaban, M.J. and Cohen, S.N. (1980) J. Mol. Biol. 138:179-207, 



Example 1; Design of the synthetic insectlcldal crystal protein gene. 



70 



T6 



so 



25 



30 



3S 



(Q Preparation of toxic subclones of the Bit gene 



Construction, isolation, and characterization o 
Natl. Acad Sci. USA 84:7038*7040. and Sekar, 
108*285. filed October 15, 1987, which is hereby 
carrying the crystal protein gene of pNSBP544 is 
(1984) Gene 32:481-485), thereby yielding a 
Expression in EL coll yields a 73 kDa crystal protein 
crystal protein obtained from Btt isolates. 

A 5.9 kbp BamH I fragment carrying the crystal 
into Bam HMlnearteed plO20H DNA. The resulting 
thereby removing Bacillus sequences flanking the|3 
p405/54-12, Is digested with Psti and religated, 
the crystal protein and about IK) bp from the 5 
plasmid, p405/81-4, Is digested with Sphl and Psti 
the following structure: 



SD 



pNSB544 is disclosed by Sekar. V. at al. (1987) Proc. 
V. and Adang, MJ. f U.S. patent application serial no. 
incorporated by reference- A 3.0 kbp Hfndlll fragment 
nserted into the Htodlll site of plO20H (Marsh, J.L et al* 
(Plasmid designated p544»Hlndlll, which Is on deposit 
In addition to the 65 kDa species characteristic of the 



3 1 GTACGTCCTAIGGTTGTTACTG5 



th<>reby 



protein gene Is removed from pNSBP544 and Inserted 
plasmid, p405/44-7, Is digested with Bglll and religated, 
'•end of the crystal protein gene. The resulting plasmid, 
removing Bacillus sequences flanking the s'-end of 
-end of the crystal protein structural gene. The resulting 
and is mixed with and llgated to a synthetic Dnker having 



MetThrAla 



5 9 CAGGAT CCAACAATGACTGCA3 1 



Psti 



(SD Indicates the location of a Shine-Dalgamo p*okaryotic ribosome binding site.) The resulting plasmid, 
p544Pst-Met5, contains a structural gene encoding a protein Identical to one encoded by pNSBP544 except 
for a deletion of the amino-terminal 47 amino acid residues. The nucleotide sequence of the Btt coding 
region in p544Pst-Met5 is presented In Figure 1. In bioassays (Sekar and Adang, U.S. patent application 
serial no. 108,285, supra), the proteins encoded by the full-length Btt gene In pNSBP544 and the N-termlna! 



deletion derivative, p544Pst-Met5. were shown to 



have their crystal protein genes In the same orient tfon as the tecZ gene of the vector. 



(H) Modification of preferred codon usage 



be equally toxic All of the plasmids mentioned above 



Table 1 presents the frequency of codon usaie for (A) dlcot proteins, (B) Bt proteins, (C) the synthetic 
Btt gene, and <D) monocot proteins. Although some codons for a particular amino acid are utilized to 
approximately the same extent by both dlcot and 3t proteins (e.g n the codons for serine), for the most part, 
the distribution of codon frequency varies significantly between dlcot and Bt proteins, as illustrated In 
columns A and B in Table 1 . 
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Table 1. Frequency of Codon Osage 

$ 

Distributi on Fraction 

Amino (A)Dicot (B)l£ (C) synthetic (D)Monocot 
Acid Codon Genes Genes Btt Gene _ — £enes 

70 





<*1 XT 




0-12 


0.08 

V 0 WW 


0.13 


0.21 




Giy 






w • «/w 


0.37 


0.18 




f^XT 




n 


0 24 


0.34 


0.21 


15 


Gly 


GGC 


0.16 


0.16 


0.16 


0.40 




GrXU 






0-13 


0.52 


0.77 




t*XU 


WVA 


O 4ft 

W 9 HO 


w . o / 


0.48 


0.23 




Asp 


GAT 




u • oo 


ft 56 


0.31 


20 

• 


Asp 


GAC 


0.43 


0.32 


0.44 


0.69 




Val 


GTG 


0.30 


a i e 
0.15 


A *JA 

0 • 30 


v . JO 




val 


GTA 


O. 12 


A 1') 


A Ifl 


O. 07 

W . w # 




Val 


GTT 


U • Jo 


A OO 


U * J9 


0.20 

W . mm w 


23 


Val 


GTC 


0.20 


0.24 


.. 0.25 


0.34 




Ala 


GCG 


A AC 

0*05 


A 1 ^ 

0. 12 


U » UO. 


0 20 
w . « w 




Ala 


GCA 


0. 2p 


a e a 
0. 50 


A OA 


W . JLW 




Ala 


GCT 


0. 42 


A 10 

U. J 2 


A 41 
0 .41 


0.28 


30 


Ala 


GCC 


0.28 


0.06 


0.29 


0.36 




Lys 


AAG 


A if 1 


0. 13 


a eo 
UiOO 


0 »7 




Lys 


AAA 


0*39 


A A *1 

0.87 


A A O 

0.42 


A -I •> 
U. XJ 




Aan 


AAT 


A A CT 

0*45 


A ^ A 

0.79 


A A A 


w • fiJ 


3$ 


Asn 


AAC 


0.55 


0.21 


0.56 


0.77 




Met 




i no 
X • uu 


i on 

X . ww 


x . w w 


• 1.00 




lie 


AXA 


V* X2* 


o in 


0 20 
w ■ « w 


0.09 




Tl A 

xxe 


Ail 


A AA 

w. 


0 57 


0.43 


0.27 


40 


lie 


ATC 


0.36 


0.13 


0.37 


0.64 




Thr 


ACG 


0.07 


0.14 


0.07 


0.18 




Thr 


ACA 


0.27 


0.68 


0.27 


0.14 




Thr 


ACT 


0.36 


0.14 


0.34 


0.22 


40 


Thr 


ACC 


0.31 


0.05 


0.32 


0.47 




Trp 


TGG 


1.00 


1.00 


1.00 


1.00 




End 


TGA 


0.46 


0.00 


0.00 


0.34 




cys 


TGT 


0.43 


0.33 


0.33 


0.27 


50 


Cys 


TGC 


0.57 


0.67 


0.67 


0.73 




End 


TAG 


0.18 


0.00 


0.00 


0.44 




End 


TAA 


0.37 


1.00 


1.00 


0.22 




Tyr 


TAT 


0.42 


0.81 


0.43 


0.19 


66 


Tyr 


TAC 


0.58 


0.19 


0.57 


0.81 
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Table 1 (CONTINUED) 



Amino 
ASM 



Codon 



(A) Dicot 
Genes 



G<ines 



niatrlbuti on Fraction 



(C) Synthetic 
Btt Gene _ 



(D)Monocot 
Genes 



Phe 

Phe 

Ser 

Ser 

Ser 

Ser 

Ser . 

Ser 


TTT 
TTC 
AGT 
AGC 
TCG 
TCA 
TCT 
TCC 


0.45 
0.55 
0.14 
0.18 
0.05 
0.18 
0.26 
0.19 


5.75 
D.25 
0.25 
0.13 
0.08 
0.19 
0.25 
0.10 


A A A 

0.44 
0.56 
0.13 
0.19 
0.06 
0. 17 
0.27 
0.17 


0.28 
0.72 
0.07 
0.25 
0.13 
0-13 
0.18 
0.24 


Arg 
Arg 
Arg 
Arg 
Arg 
Arg 


AGG 
AGA 
CGG 
CGA 
CGT 
CGC 


0.22 
0.31 
0.04 
0.09 
0.23 
0.11 


0.09 
0.50 
0.14 
0.14 
0.09 
0.05 


0.23 
0.32 
0.05 
0.09 
0.23 
0.09 


0.28 
0.08 
0.14 

0.11 
0.36 


Gin 
Gin 
His 
His 


GAG 
CAA 
CAT 
CAC 


0.38 
0.62 
0.52 
0.48 


0.18 
0.82 
0.90 
0.10 


0.39 
0.61 
0.50 
0.50 


0.43 
0.57 
0.38 
0.62 


Leu 
Leu 
Leu 
Leu 
Leu 
Leu 


TTG 
TTA 
CTG 
CTA 
CTT 
CTC 


0.26 
0.10 
0.09 
0.08 
0.29 
0.19 


0.08 
0.46 
0.04 
0.21 
0.15 
0.06 


0.27 
0.12 
0.10 
0.10 
0.18 
0.22 


0.15 
0.04 
0.27 
0.11 
0.16 
0.27 


Pro 
Pro 
Pro 
Pro 


CCG 
CCA 
CCT 
CCC 


0.07 
0.44 
0.32 
0.16 


0.20 
0.56 
0.24 
0.00 


0.08 
0.44 
0.32 
0.16 


0.20 
0.39 
0.19 
0.22 



Bt coding sequences . 
sequences of dicot nucleajr 
codon usage table. The 
obtained from Genbank, 



publicly 



available and 88 coding 
genes were used to compile the 
pooled dicot coding sequences, 
were: 
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Tfthlf 1 (CONTIMUrj)) 



GENUS/ SPECIES 



GEN DANK 



PROTEIN 



RKF 



Antmhinummajus 
Ardbtdopsis thdh/ut 



Iknhotletia acdsa 



Brassica napus 
BrauiaioUocta 
CanavcBatnu forms 
Cartas papaya 
Qtlcmdomoiua 
fttnhafdto 



CuoirbtiQ pcpQ 
Cuomussahus 



Dauaatanta 

Dolktos bipojus 
fla\KitaaOtav{a 
Gfdncmax 



AMACHS 

AT1UDU 

ATHK3GA 

ATHU3GB 

ATOH4CA 

ATllUtCri 

ATirruBA 



BNANAP 
BOLSLSGR 
CENCONA 
CPAPAP 

CREC552 

CRERBCS1 

CKERBCS2 

cucpirr 

CUSCMS 
CUSLHCPA 

cusssu 

DAKEJCT 

DAREXTR 

DBILECS 

FTR8CR 

SOY7SAA 

SOVACT1G 

sovcnpi 
soycmau 
soyclyaab 
soycutab 

SOYGLYR 

SOYHSPI75 

SOYLGBI 

SOYLEA 

SOYLOX 

SOYKOD20G 

SOYKOD23G 

SOYNOD24II 

SOYNODWD 

SOYKOD26R 

SOYNOD27R 

SOYNOD35M 

SOYNC07S 

SOYNOORX 

SOYNODR2 

SOYTRPl 

SOYRUBP 

SOYURA 

SOYItSPWA 



Otaiconc synthetase 

Alcohol dehydrogenase 

Hiswmc3gcnel 

Histone3gene2 

Histone4gene t 

CAD 

a tubulin 

5<noJpyruvyWhifate 3-phosphaie 
synthetase 

High methionine storage protein 

Aetf earner protein 

Napm 

S4ocus specific fetoprotein 
Coacanavatin A 
Papain 

Prcapexytochromc 
RuDPC small subunit gene 1 
RuBPC small wbunit gene 2 
Photochrome 

Gtyotosomal matate synthetase 
CAB 

RuBPC small subuntt 
Eaensin 

33 kO extern ta related protein 

seed lectin 

fUiBPC small subuoit 

7S storage protein 

Actio 1 

Ol protease tolubfror 
QycinbAU Bi submits 
Glycmin A5A403 sub unto 
Ctjcinio A3/b4 subunits 
O)dnmA20Ja svbunhs 
LowM Wheat shock proteins 
Lc^bcmogjtabtn 
Lectin 

Uporyjenase 1 
20 kDa nodulin 
23kOanodufin 

34 kDa nodiHin 
26 kDa nodulin 

26 kDa nodulin 

27 kOa nodulin 

35 kDa nodulin 
7$ kDa nodulin 
Nodulin Ol 
Nodulin E27 
Proline rich protein 
RuBPC small subtmh 
Urease 

Heal shock protein 26A 
hHidear-encoded chloropUst 
heal shock protein 
22 kDa nodulin 
0\ tubulin 
fit tubulin 



2 
3 



5 
6 
6 
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ItrtumOms ennui 



70 



tpomtxo 
LtiptmiS tulCUS 

an 



IS 



20 



25 



30 



Afcrfteigosnmo 
Ma&ttbtycntlMButfti 
oyyf Ritttttn 

Nkotfann 



35 



40 



hcntnst 
tsp. 



45 



50 



Phaxcoha w/gcru 



66 



iiKNKuncs 



LCIAD19 

LCIR5BPC 

LUPLBR 

TOMB 10 OR 

TOMCTirVDF 

T0MPC2AR 

TOMPSI 

TOMRBCSA 

TOMRBCSB 

TOMRBCSC 

TOMRBCSD 

TOMRRD 

TONWIPIC 

TOMWini 



ALFLB3R 



TOBATPlt 



TOOECH 

TO&GAPA 

TODGAPU 

TODCAPC 

TOBPR1AR 

TOUPRtCR 

TOBPRPR 

TOOrXOLF 

TOORBPCO 

TOBTIUUP 

AVOCEL 

PTIOCHL 

rercABu 

PCTCABUl, 
PCTCAB22T 
PETCAB2S 
PETCAU37 

PCTCHSR 
PCTGCRI 
PCTRBCSNi 
PETRBCSt 

piivchm 

PltVDLEO 
ntVDLECIl 
fltVGSRI 
PHVGSR2 



PKCMTXN 



Mo ptubufcn (victim) 

RuUl'C small subumt 

3 albumin seed storage proiciM 

Wound-induced catalasc 

CAD 

RuBPC small subunit 
il 



7 



H 
1 



ttottn binding protein 
Ethylene biosynthesis protein 
Por^alsctu rooasc-2a 
Tomato photosystem I protein 
RuBPC small submit 
RuBPC small subunit 
RuBPC small subunit 
RuBPC small subunit 
Ripening related protein 
Wound induced proteinase 
inhibitor I 

Wound induced proteinase 

inhibitor U 

CADI A 

CAD )0 

CA03C 

CAD4 

CADS 

Leghcmogtobm IU 
RuOPCsmaQ subunit 
Mitochondrial ATP synthase 



Nhrate reductase 
Qutamme 



10 
10 
10 

11 
11 

12 



13 
14 



A subunit of Chtofoptest G3PD 
OsubcnitcfchloropUst C3PD 
Csobimit of chtaoptasi CJP O 
Pathogenesis related protein la 
PaAogcoesis-fcUted proteinic 
Pathogenesis related protein lb 
Peroxidase 

RuBPC Small subunit 
TMV4nduced protein homologous 
tothaumatin 
Cegn tasc 

Qulcone synthase 

CAD 13 

CAD 221 

CAD 23 R 

CAD 25 

CAD 37 

CA091R 

Chatamc synthase 

Qycfoe*rich protein 

RuOPCsmaQ subunit 

RuDPC smaH subunit 

TOUDaheat shod: protein 

Chitinase 

Phytohemagglutinin C 
PhyiohemaggluUnin L 
Gtuumine synthetase I 
Clutamine synthetases 
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Tahlc 1 (COHTINUEO) 



GENUS/ SPECIES 



CEKUANK 



PROTEIN 



wt jji\«nn 

rtsunx stiuvum 



Rephenussathius 



SOauspmatsis 
Slruspisclba 



Spinadaoitracta 



Vkia[abo 



ruvuu 

PIIVUvCT 
PHVPAL 
PIIVPUASAR 
PUVTKASBR 



PEAALD2 

PEACAB80 

PEAGSR1 

PEALECA 

PEAtEGA 

PEARUBPS 

PEAVia 

FEAVIC4 

PEAV1C7 



RCCACC 

RCCRION 

RCQCW 

SIPfBX 

SIPPCY 

SAUSAPD1I 

POIPAT 

pomKmvi 

POTLStG 
POTVUC 

poroses 

SPIACF! 

snoECi* 

SPKOEQ3 

SPIPCG 
SPIPS33 



VFALBA 
VFALEW 



lectin 

Phenylalanine ammonia lyase 

crpbascotin 

pphascoUn 

AiceJin seed protein 

Chatcouc synthase 

Seed albumin 

CAS 

Qutamine synthetase (nodule) 

Lectin 

Lcgotnhi 

RuBPC small scbunit 

Vkflm 

VkOm 

VJdHn 

Alcohol dehydrogenase t 
CStttaroinesymhetasoCfeaO 
OUiumine synthetase (row) 
Ktenel 

Nedeir encoded cWoroptast 
beat shack protein 
RttOPC small sebtuw 
Agglutinin 
Rfctn 

IsocHrate lyase 
Pencdcaan precursor 
Ptastocyantn precursor 
Knckar gene for C3PO 
Pataibt 



inhibitor 

U#t4adtsefc!< Usee specific 
ST4Slgenc 

Wqond-tndoced proteinase 
inhibitor 11 
lUOrCstnaflsusonit 
Sucrose synthetase 
Acyl canter protein t 
16kDapbot05ynibctic 
CKyx^^-cvoh^prtttin 
23kDapha*ocyntJ>ctk 



Plastocyamn 
33kOif>bO(oryntnetk« 
oddatiM compka precursor 
G)ycoUteoida*sc 
Ugfacmogtohm 
LecumlnD 
Vkiilm 



RKK 



16 
17 



18 
19 

21 



22 



23 
24 



Pooled S3 monocot coding sequences obtained from GenbanK 
(release SS) or, vhen no Genbank file name is specified, 
directly from the published source, were: 
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1 Table 1 (CONTINUED) 



CENUSTSrSClES CENBANK 



Avenasauva 
Hordcum vulgar* 



Oryzasattva 
Tfiilatm aathwn 



Sccatecatate 
Zcanurys 



PROTEIN 



REF 



ASTAP3R 
DLVALR 
BLVAMYl 
BLYAMY2 
BLYCHORDl 
BLYCLUCB 
BLYKORB 
BIYPAPt 
BLYTMAR 
BLYUB1QR 



RICCLUTC 
WKTAMYA 

wirrcAB 

WKTEMR 
WKTCIR 
WHTCLCB 
WHTCUAB^ 
WOTCLUn 
\VKTW 
WKTH4091 
WHTRBCT 
KYESECGSp 
MZEAtC 

MZEACTIC! 
MZEADHI] F 
MZEADH2KR 
MZEALD 
MZEAKT 
MZEEG2R 
MIECGST?B 

mzdoo 

MZEiUCH 

mzehstiui 

MZEIlSPTtt 
MZEUICT 
MZEMPU 
MfcEPEPCR 
MZERBCI 

Mzesusvsc 
stum 

MZEZEAlOM 
MZEZEAlOM 
MZEZE15A3 
MZEZEU 
MXEZEWA 
MZEZEZ A 
MZEZE2H8 



Pbyiocbromc 3 

Atcurain 

a amylase 1 

o amylase 2 

HordelnC 

0glucanasc 

Blhorddn 

Amytase/protcasc tnhtoiior 

Toaioahoroothkmtn 

UbtquiOn 

IfistoaeS 

Leal specific Uifonini . 

Uafspeaficihicnin2 % 

rUxtocyinin 

Gutelin 

Gtutcitn 

a amylase 

CAB 

Em protein 

gfobercllin reaponsre protein 
«70tadfo 

offl gtiadinOassAH 
Ifig>iMWg|tttcain 
H!stoj*3 
Hktone4 

KnQPCsnuasotmnH 
^cecaltd 

40.1 UD At proton (NADPH- 

depefidenifedoctase) 

Aetbi 

Alcohol dchylrogenasc 1 
Akchol<5eJbydrogcna$e2 ■ 
Aldolase 

ATT/ADPttanstocator 
Qttteli»2 

Glutathione S transferase 

Histoac3 

I(t$toae4 

70 kO Heat shock protein* cm 1 
TO feO Heii shock protein, cron 2 
CAO 

lipid body surface protein U 

Phosphoendyrwate carboxylase 

RuDPC small tiibvnit 

Sucrose synthetase 

Triphosphate isomciase 1 

ttkDxrift 

19kDselo 

iSfcDzdo 

I6k0zdn 

19kDteJn 

22 kD rein 

22kDtdn 

Catalase2 

Regulatory O locus 



25 
26 

if 

28 



29 
30 
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Table 1 (CONTINUED) 

Bt codons were obtained from analysis of coding sequences 
of the following genes: Bt var. kurstaki HD-73, 6.6kb 
Hilldlll fragment (Kronstad at al. (1983) J. Bacterxol. 
154:419-428); fit var. kurstaki HD-1, 5.3 kb fragment (Adang 
etTai- (1987) in Biotechnol ogy in Invertebrate Pathology 
and Cell Culture , K. Haramorosh (ed.), Academic Press, Inc. 
New York, pp. 85-99) ; It var. leurstakj HD-1 4.5 kb 
fragment (Schnepfi and Whiteley (1985) J. Biol. Chem. 
260:6273-6280); and Sfe var. tenebr*0"13' 3.0 kb Hindlll 
fragment (Sekar fit al. (1987) Proc. Natl. Acad. Sci. 
84.: 7036-7040) . 



REFERENCES 

X. Klee, H.J. fit al. (1987) Hoi. Gen. Genet. 210:437- 
442. 

2. Altenbach, S.B. fit al. (1987) Plant Hoi. Biol. fi:239- 
250. 

3. Rose, R.E. fit al- (1987) Nucl. Acids Res. 1S:7197. 

4. Vierling, E. at al. (1988) EMB0 J. 2:575-581. 

5. Sandal, N.N. fit al- (1987) Nucl. Acids Res. 15:1507- 
1519. 

6. Tingey, S.V. fit al. (1987) EMBO J. 6.: 1-9. 

7. Chlan, C.A. fit al. (1987) Plant Hoi. Biol. 9:533-546. 

8. Allen, R.D. fit al. (1987) Mol. Gen. Genet. 210.: 211- 
218. 

9. sakajo, S. fit al* (1987) Eur. J. Biochem. MS: 437-442. 

10. Pirersky, E. fit al« (1987) Plant Hoi. Biol. 2:109-120. 

11. Ray, J. fit al- (1987) Nucl. Acids Res. IS: 10587. 

12. DeRocjer, E.J. fit al- (1987) Nucl. Acids Res. £S:6301. 

13. Calza, R. g£ al. (1987) Hoi. Gen. Genet. 202:552-562. 

14. Tingey, S.V. and Coruzzi, G.H. (1987) Plant Phys. 
31:366-373. 

15. Winter, J. at al- (1988) Hoi. Gen. Genet. 211:315-319. 

16. Osbom, T.C. at al. (1988) Science 2ifi:207-210. 
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Table 1 (CONTINUED) 



17. 

18. 

19. 
20. 

21. 

22. 
23. 

24. 
25. 
26. 

27. 

28. 
29. 

30. 



Llewellyn, D.J. et ai 
123. 



Ryder, T.B. et al. (}987) Mol. Gen. Genet. 210:219- 
233. 



(1987) J. Mol. Biol. 125:115- 



Tingey, S.V. et ai. (1987) EMBO J. 6:1-9. 

Gantt, J.S. and Key, J.L. (1987) Eur. J. Biochem. 
1££: 119-125. 

Guidet, F. and Fourci|roy, P. (1988) Nucl. Acids Res. 
i£:2336. 

Salanoubat, M. and Belliard, G. (1987) Gene 60:47-56, 

Volokita, M. and Sonerville, C.R. (1987) J. Biol. 
Chem. 262:15825-15828, 

Bassner, R. e£ al. (1^87) Nucl. Acids Res. 1S:9609. 

Chojecki, J. (1986) Carlsberg Res. Commun. 51:211-217, 

Bohlinann, H. and Apjsl, K. (1987) Mol. Gen. Genet. 
207:446-454. 



Nielsen, P.S. and 

225:159-162. 



Sausing, K. (1987) FEBS Lett. 



Higuchi, W. and Fukaziwa, C. (1987) Gene 5^:245-253, 



Bethards, L.A. et al 
USA 6830-6834. 



add 
tie 



codon 



For example, dlcots utilize the AAQ codon for 
frequency of 39%. In contrast, in Bt proteins the 
13% and 87%, respectively. It is known in the art 
system and must be avoided or used Judiciously 
crystal protein, individual amino add codons four* I 
preferred by dlcot genes for a particular amino 
distribution of codons for each amino acid within 
alanine, it can be seen from Table 1 that the 
whereas the codon OCT is the preferred codon in 
codons for alanine in the original Bt gene are 
changed to OCT while others are replaced with 
overall distribution of codons for alanine used in 
goal is achieved; the frequency of codon usage 
that used in the synthetic Btt gene (column C). 

In similar manner, a synthetic gene codiruj 
enhanced expression in monocot plants. In Table 
of highly expressed monocot proteins. 

Because of the degenerate nature of the 
expressed in this protein. It is clear that varlaUoji 
phenomenon since systematic codon preferences 
genes. Analysis of a large group of plant gene 



. (1987) Proc. Natl. Acad. Sci. 
Paz-Ares, J. et al- (^87) EMBO J. 6:3553-3558. 



lyrlne 



with a frequency of 61% and the AAA codon with a 
lysine codons AAQ and AAA are used with a frequency of 
hat seldom used codons are generally detrimental to that 
. Thus, in designing a synthetic gene encoding the Btt 
in the original Btt gene are altered to reflect the codons 
. However, attention Is given to maintaining the overall 
coding region of the gene. For example, in the case of 
GCA is used In Bt proteins with a frequency of 50%, 
dlcot proteins. In designing the synthetic Btt gene, not all 
replaced by OCT; instead, only some alanine codons are 
different alanine codons in an attempt to preserve the 
dlcot proteins. Column C in Table 1 documents that this 
n cflcot proteins (column A) corresponds very closely to 



for Insecticidal crystal protein can be optimized for 
1, column D, is presented the frequency of codon usage 



genetic 



code, only part of the variation contained in a gene is 
between degenerate base frequencies is not a neutral 
have been reported fcr bacterial, yeast and mammalian 
sequences Indicates that synonymous codons are used 
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differently by monocots and dicots. These patterns are also distinct from those reported forE roll yeast 
and man. 

In general, the plant codon usage pattern more closely resembles that of man and other higher 
eukaryotes than unicellular organisms, due to the overall preference for Q+C content In codon position III. 

5 Monocots In this sample share the most commonly used codon for 13 of 18 amino adds as that reported 
for a sample of human genes (Grantham et ah (1988 supra) , although dicots favor the most commonly used 
human codon In only .7 of 18 amino addsT" 

Discussions of plant codon usage have focused on the differences between codon chdce in plant 
nudear genes and In chloroplasts. Chloroplasts differ from higher plants In that they encode only 30 tRNA 

10 spedes. Since chloroplasts have restricted their tRNA genes, the use of preferred codons by chloroplast- 
encoded proteins appears more extreme. However, a positive correlation has been reported between the 
level of Isoaccepting tRNA for a given amino acid and the frequency with which this codon is used In the 
chloroplast genome (Pfitdnger et al. (1887) Nucl. Adds Res. 15:1377-1388). 

Our analysis of the plant genes sample confirms earlier reports that the nuclear and chloroplast 

75 genomes in plants have distinct coding strategies. The codon usage of monocots In this sample Is distinct 
from chloroplast usage, sharing the most commonly used codon for only 1 of 18 amino adds. Dicots In this 
sample share the most commonly used codon of chloroplasts In only 4 of 18 amino adds. In general, the 
chloroplast codon profile more closely resembles that of unicellular organisms, with a strong bias towards 
the use of A+T In the degenerate third base. 

20 In unicellular organisms, highly expressed genes use a smaller subset of codons than do weakly 
expressed genes although the codons preferred are distinct In some cases. Sharp and U (1988) Nud. Adds 
Res. 14:7734-7749 report that codon usage in 165 E coll genes reveals a positive correlation between high 
expreisfon and increased codon Was. Bennetzen and Hall (1982) supra have described a similar trend In 
codon selection in yeast Codon usage In these highly expressed genes correlates with the abundance of 

6 isoaccepting tRNAs In both yeast and EcoR. ft has been proposed that the good fit of abundant yeast and 
E. coll mRNA codon usage to Isoacceptor tRNA abundance promotes high translation levels and high 
stea5y state levels of these proteins. This strongly suggests that the potential for high levels of expression 
of plant genes in yeast or E coD Is limited by their codon usage. Hoekema et al. (1887) supra report that 
replacement of the 25 most favored yeast codons with rare codons in the 5 end of the highly expressed 

so gene PQK1 leads to a decrease in both mRNA and protein. These results Indicate that codon bias should 
be emphasized when engineering high expression of foreign genes in yeast and other systems. 



P) Sequences within the Btt coding region having potentially destabilizing influences 

Analysis of the Btt gene reveals that the A + T content represents 64% of the DNA base composition 
of the coding regionTthis level of A + T Is about 10% higher than that found In a typical plant coding 
region. Most often, high A + T regions are found in intergenlc regions. Also, many plant regulatory 
sequences are observed to be AT-rich. These observations lead to the consideration that an elevated A + 
40 T content within the Btt coding region may be contributing to a low expression level In plants. Con- 
sequently, In designing a synthetic Btt gene, the A + T content Is decreased to more closely approximate 
the A + T levels found in plant proteins. As Illustrated In Table 3, the A + T content Is lowered to a levelin 
keeping with that found In coding regions of plant nuclear genes. The synthetic Btt gene of this invention 
has an A + T content of 55%. 

45 

Table 3 



Adenine + Thymine Content In Btt Coding Region 


Coding Region 


Base 


%6+C 


%A+T 


G 


A 


T 


C 


Natural Btt gene 
Synthetic Btt gene 


341 
392 


633 
530 


514 
483 


308 
428 


36 
45 


64 
55 
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In addition, the natural Btt gene is scanned foi sequences that are potentially destabilizing to Btt RNA. 
These sequences, when identified in the original B i gene, are eliminated through modification of nucleotide 
sequences. Included in this group of potentially destabilizing sequences are: 

(a) plant polyadenylation signals (as described by Joshi (1987) Nucl. Acids Res. 15:9627-9640^ In 
s eukaryotes, the primary transcripts of nuclear {enes are extensively processed (steps including 5 - 

capping, fntron splicing, polyadenylation) to forn mature and translatable mRNAs. In higher plants, 
polyadenylation involves endonucleotylic cleavage at the polyA site followed by the addition of several A 
residues to the cleaved end. The selection of tie polyA site is presumed to be cls-regulated. During 
expression of Bt protein and RNA In different plants, the present Inventors have observed that the 
10 polyadenylatedlnRNA Isolated from these expression systems is not fulHengtii but Instead Is truncated or 
degraded. Hence, In the present invention ft was decided to minimize possible destablllzation of RNA 
through elimination of potential polyadenylation si jnals within the coding region of the synthetic Btt gene. 
Plant polyadenylation signals Including AATAAA, AATQAA, AATAAT, AATATT, GATAAA, GATAAA, and 
AATAAQ motifs do not appear in the synthetic Btt jene when scanned for 0 mismatches of the sequences. 

(b) polymerase II termination sequence, CAN7-9AGTNNAA. This sequence was shown (Varikan and 
Fitipowicz (1988) EMBO J. 7:791-799) to be next t > the 3* end of the coding region of the U2 snRNA genes 
of Arabldopsis thallana and Is believed to be impotant for transcription termination upon 3 end processing. 
The synthetic Btt gene Is devoid of this termlnatior sequence. 

(c) CUUS&G hairpins, responsible for extteordinarily stable RNA secondary structures associated 
20 with various biochemical processes (Tuerk et aL (1988) Proc. Natl. Acad. Sci. 85:1364-1388). The 

exceptional stability of CUUCGG hairpins suggests fthat they have an unusual structure and may function in 
organizing the proper folding of complex RNA structures. CUUCGG hairpin sequences are not found with 
either 0 or 1 mismatches in the Btt coding region. 

(d) plant consensus spllcTsftes, 5' = AAG:GTAAGT and 3' = TTTT(Pu)m(Pu)T(Pu)T(Pu)T(Pu)- 
25 TGCAG:C. as described by Brown et aL (1988) EMBO J. 5*2749-2758. Consensus sequences for the 5 and 

3' splice junctions have been derived from 20 and 30 plant intron sequences, respectively. Although it is not 
likely that such potential splice sequences are piesent in Bt genes, a search was Initiated for sequences 
resembling plant consensus splice sites in the syrthetic Btt gene. For the 5 splice site, the closest match 
was with three mismatches. This gave 12 sequences of which two had GkGT. Only position 948 was 
30 changed because 1323 has the Kpnl site needed for reconstruction. The 3 -splice site is not found In the 
synthetic Btt gene. 

Thus, by highlighting potential RNA-destabilfcang sequences, the synthetic Btt gene is designed to 
eliminate known eukaryotic regulatory sequences tfiat effect RNA synthesis and processing. 



35 



Example 2. Chemical synthesis of a modified Btt structural gene 



so 



55 



(I) Synthesis Strategy 

The general plant for synthesizing linear doub e-stranded DNA sequences coding for the crystal protein 
from Btt is schematically simplified in Figure 2. The optimized DNA coding sequence (Figure 1) Is divided 
into thirteen segments (segments A-M) to be synthesized individually, isolated and purified. As shown In 
Figure 2, the general strategy begins by enzymatcally foining segments A and M to form segments AM to 
which is added segment BL to form segment AllLM. Segment CK Is then added enzymattcally to make 
segment ABCKLM which is enlarged through ad iition of segments DJ, El and RFH sequentially to give 
finally the total segment A8CDEFGHIJKLM, repra entfng the entire coding region of the Btt gene. 

Figure 3 outlines in more detail the strategy used In combining individual DNA segments in order to 
effect the synthesis of a gene having unique restr ctton sites integrated Into a defined nucleotide sequence. 
Each of the thirteen segments (A to M) has unique restriction sites at bo 1 ends, allowing the segment to be 
strategically spliced into a growing DNA polymer. Also, unique sites are placed at each end of the gene to 
enable easy transfer from one vector to another. 

The thirteen segments (A to M) used to construct the synthetic gene vary in size. Oligonucleotide pairs 
of approximately 75 nucleotides each are used to construct larger segments having approximately 225 
nucleotide pairs. Figure 3 documents the nunuor of base pairs contained within each segment and 
specifies the unique restriction sites bordering >ach segment Also, the overall strategy to Incorporate 
Specific segments at appropriate splice sites is da ailed in Figure 3. 
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(li) Preparation of ollgodeoxynucleotides 

Preparation of oBgcdeoxynucleotldes for use in the synthesis of a DNA sequence comprising a gene for 
Btt is carried out according to the general procedures described by Matteucci et a[. (1981) J. Am. Chem. 

s Sob. 103:3185-3192 and Beaucage et al. (1981) Tetrahedron Lett. 22:1859-1882. All oligonucleotides are 
prepared by the solid-phase phosphoramidite Wester coupling approach, using an Applied Biosystems 
Model 380A DNA synthesizer. Oeprotectkpn and cleavage of the oligomers from the solid support are 
carried out according to standard procedures. Crude oligonucleotide mixtures are purified using an 
oligonucleotide purification cartridge (OTC, Applied Biosystems) as described by McBride et al. (1888) 

io Biotachnlques 8382-387. 

s'-plrosphdrylatton of oligonucleotides Is performed with T4 polynucleotide kinase. The reaction con- 
tains 2ug oligonucleotide and 182 units polynucleotide kinase (Pharmacia) in linker kinase buffer (Maniatis 
(1982) Cloning Manual, Fritsch and Sambrook (eda.). Cold Spring Harbor Laboratory, Cold Spring Harbor, 
NY). The reaction is incubated at 37* C for 1 hour. 

is Oligonucleotides are annealed by first heating to 95* C for 5 mln. and then allowing complementary 
pairs to cool slowly to room temperature. Annealed pairs are reheated to 85* C solutions are combined, 
cooled slowly to room temperature and kept on Ice until used* The Bgated mixture may be purified by 
electrophoresis through a 4% NuSleve agarose (FMC) gel The band corresponding to the Dgated duplex is 
excised, the DNA is extracted from the agarose and ethanol precipitated 

20 Ligations are carried out as exemplified by that used in M segment ligations. M segment DNA Is 
brought to 65* C for 25 min, the desired vector is added and the reaction mixture is incubated at 65* C for 
15 min. The reaction Is slow cooled over 1-1/2 hours to room temperature. ATP to 0.5mM and ZJS units of 
T4 DNA ligase salts are added and the reaction mixture is Incubated for 2 hr at room temperature and then 
maintained overnight at 15* C. The next morning, vectors which had not been llgated to M block DNA were 

20 removed upon linearization by EcoRl digestion. Vectors Kigated to the M segment DNA are used to 
transform E coll MCI 081. Colonies containing Inserted blocks are identified by colony hybridization 
with&P'labeiied oligonucleotide probes. The sequence of the DNA segment is confirmed by isolating 
plasmid DNA and sequencing using the dideoxy method of Sanger et al. (1977) Proa Natl. Acad. ScL 
74:5463-5467. 

so 

(M) Synthesis of Segment AM 

Three oligonucleotide pairs (A1 and its complementary strand A1c A2 and A2C and A3 and A3C) are 
35 assembled and llgated as described above to make up segment A. The nucleotide sequence of segment A 
Is as follows: 
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mcleotides. Fragment A1 contains 71 bases, A1c has 
a5 has 82 bases and A3C has 78 bases, in all, segment A 

one destroyed 

EcoRI site (5')J. {Additional restriction sites withl|i SegmenTA are indicated). The Eco RI single-stranded 

i ilgated to the EcoRl-cut cloning vector, plC20K. 
M1, 80 bases, M1c, 86 bases, M2, 67 bases, M2c, 
_ individual oligonucleotides are annealed and Ilgated 
described above. The overall nucleotide sequence of segment M is: 



In Table 4, bold lines demarcate the individual orjgon 
78 bases, A2 has 75 bases, A2C has 78 bases. 

is composed of 228 base pairs and is contained Between EcoRI restriction enzyme site and 
EcoRI site (5')J. {Additional restriction sites withh Segment A are indicated). The Eco RI i 
cohesive ends allow segment A to be annealed ana then ilgated to the EcoRl-cut cloning vector, 



Segment M comprises three oligonucleotide jjairs 
87 bases, M3, 85 bases and M3c 79 bases 
according to standard procedures as 
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In Tabid 5 bold lines demarcate the individual oligonucleotides. Segment M contains 252 base pairs and 
has destroyed EcoRI, restriction sites at both ends. (Additional restriction sites within segment M are 
indicated). Segment M Is inserted into vector plC20R at an EcoR I restriction site and cloned. 

As proposed In Figure 3, segment M is joined to segment A in the plasmid In which it is contained. 
Segment M is excised at the flanking restrictions sites from Its cloning vector and spliced into plC20K, 
harboring segment A, through successive digestions with Hindlll followed by Bglll. The plC20K vector now 
comprises segment A Joined to segment M with a Hindlll site at the splice site (see Figure 3). Plasmid 
plC20K is derived from piC20R by removing the ScaHMde) DNA fragment and Inserting a Hindi fragment 
containing an NPTI coding region. The resulting plasmkfof 4.44 kb confers resistance to kanamydn on EL 
coll. 



Example 3. Expression of synthetic crystal protein gene in bacterial systems 

The synthetic Btt gene is designed so that it is expressed in the plC20R-kan vector in which it is 
constructed. This expression is produced utilizing the initiation methionine of the lacZ protein of plC20K. 
The wild-type Btt crystal protein sequence expressed In this manner has full insectfcldal activity. In addition, 
the synthetic gene is designed to contain a BamH I site 5' proximal to the initiating methionine codon and a 
Bglll site 3' to the terminal TAG translation stop codon. This facilitates the cloning of the Insectlcidal crystal 
protein coding region into bacterial expression vectors such as pDRS40 (Russell and Bennett 1982). 
Plasmid pDR540 contains the TAC promoter which allows the production of proteins Including Btt crystal 
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protein under controlled conditions in amounts 
functions in many granwiegative bacteria includinj 

Production of Bt insecticidal crystal protein 
protein produced has the expected toxicity to co 
themselves have potential value as microbial Insecticides 



up to 10% of the total bacterial protein. This promoter 
E. coli and Pseudomonas. 



from 



Example 4. Expression of a synthetic crystal prote n gene In plants 
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the synthetic gene In bacteria demonstrates that the 
eopteran insects. These recombinant bacterial strains in 
i, product of the synthetic gene. 



10 The synthetic Btt crystal protein gene is designed to facilitate cloning into the expression cassettes. 
These utilize sites compatible with the Bam HI and Bglll restriction sites flanking the synthetic gene. 
Cassettes are available that utilize plant promoters including CaMV 35S, CaMV 193 and the ORF 24 
promoter from T-ONA. These cassettes provide the recognition signals essential for expression of proteins 
in plants. These cassettes are utilized in the mic o Ti plasmids such as pH575. Plasmids such as pH575 

is containing the synthetic Btt gene directed by pi nt expression signals are utilized in disarmed Agrobac- 
terium tumefaciens to introduce the synthetic jene into plant genomic DNA. This system has been 
diicRbed previously by Adang et al. (1987) to express Bt var. kurstaki crystal protein gene in tobacco 
plants. These tobacco plants were toxic to feeding tobacco homworms. 



Example 5. Assay for Insectlcidal activity 



Btoassays were conducted essentially as 
an estimate of the LDso. Plasmids were grown 
33:103-119). On a molar basis, no significant 
encoded by p544Pst-Met5. p544-Hlndlll, and 
conditions, cells containing protein encoded by 
those containing protein encoded by the native 
indicated that those that were more toxic had 
synthetic Btt gene relative to that of a natural Btt 
transcripts from expression of synthetic Btt genes 



Claims 



described by Sekar, V. et al. supra . Toxicity was assessed by 
n E. coli JM105 (Yanisch-Perron, C. et al. (1985) Gene 
differences in toxicity were observed betweerfcrystal proteins 
|)NSBP544. When expressed in plants under identical 
I he synthetic gene were observed to be more toxic than 
Btt gene. Immunoblots ("western 1 * blots) of cell cultures 
rhore crystal protein antigen. Improved expression of the 
gene was seen as the ability to quantitate specific mRNA 
on Northern blot assays. 



DNA sequence is that presented in Rgure 1, spanning 



1. A synthetic gene designed to be highly expressed in plants comprising a DNA sequence encoding 
an insecticidal protein which is functionally equiva ent to a native Insecticidal protein of Bt 

2. A synthetic gene of claim 1 wherein said DNA sequence is at least about 85% homologous to a 
native insecticidal protein gene of Btt 

3. A synthetic gene of claim 1 wherein saic 
nucleotides 1 through 1793. 

4. A synthetic gene of claim 1 wherein sale) DNA sequence is that presented In Rgure 1 spanning 
nucleotides 1 through 1833. 

5. A synthetic gene of claim 1 wherein the overall frequency of preferred codon usage within the entire 
coding region of said synthetic gene Is within a|»ut 75% of the frequency of codon. usage preferred in 
plants. 

6. A synthetic gene of claim 1 wherein the A+T base content of said DNA sequence is substantially 
equal to the A+T base content found in plant structural genes. 

7. A synthetic gene of claim 1 wherein a plan} initiation sequence is present at the 5 end of the coding 
region. 

8. A synthetic gene of claim 



1 



11. A synthetic gene of claim I wherein plan: consensus splice sites, including 5 
»TTTT(Pu)TTT(Pu)T(Pu)T(Pu)T(Pu)TQCAG:C. ve eliminated in said DNA sequence. 



wherein plant polyadenyla-tion signals, comprising those having 
AATAAA, AATGAA, AATAAT, AATATT, GATAAA,[gATAAA and AATAAG motifs, are eliminated In said DNA 
sequence. 

9. A synthetic gene of claim 1 wherein the pplymerase II termination sequence. CANr-sAGTNNAA, is 
eliminated in said DNA sequence. 

10. A synthetic gene of claim 1 wherein CUUttGG hairpins are eliminated in said DNA sequence. 
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